AITopics | rct data

Collaborating Authors

rct data

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Addressing Hidden Confounding with Heterogeneous Observational Datasets for Recommendation

Neural Information Processing SystemsFeb-18-2026, 14:19:04 GMT

The collected data in recommender systems generally suffers selection bias. Considerable works are proposed to address selection bias induced by observed user and item features, but they fail when hidden features (e.g., user age or salary) that affect

artificial intelligence, information management, machine learning, (19 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
Europe > Switzerland (0.04)
(2 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > Strength High (0.68)

Industry:

Information Technology (0.67)
Health & Medicine (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Information Management (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.88)
(2 more...)

Add feedback

Bi-Level Decision-Focused Causal Learning for Large-Scale Marketing Optimization: Bridging Observational and Experimental Data

Zhang, Shuli, Zhou, Hao, Zheng, Jiaqi, Jiang, Guibin, Cheng, Bing, Lin, Wei, Chen, Guihai

arXiv.org Artificial IntelligenceOct-23-2025

Online Internet platforms require sophisticated marketing strategies to optimize user retention and platform revenue -- a classical resource allocation problem. Traditional solutions adopt a two-stage pipeline: machine learning (ML) for predicting individual treatment effects to marketing actions, followed by operations research (OR) optimization for decision-making. This paradigm presents two fundamental technical challenges. First, the prediction-decision misalignment: Conventional ML methods focus solely on prediction accuracy without considering downstream optimization objectives, leading to improved predictive metrics that fail to translate to better decisions. Second, the bias-variance dilemma: Observational data suffers from multiple biases (e.g., selection bias, position bias), while experimental data (e.g., randomized controlled trials), though unbiased, is typically scarce and costly -- resulting in high-variance estimates. We propose Bi-level Decision-Focused Causal Learning (Bi-DFCL) that systematically addresses these challenges. First, we develop an unbiased estimator of OR decision quality using experimental data, which guides ML model training through surrogate loss functions that bridge discrete optimization gradients. Second, we establish a bi-level optimization framework that jointly leverages observational and experimental data, solved via implicit differentiation. This novel formulation enables our unbiased OR estimator to correct learning directions from biased observational data, achieving optimal bias-variance tradeoff. Extensive evaluations on public benchmarks, industrial marketing datasets, and large-scale online A/B tests demonstrate the effectiveness of Bi-DFCL, showing statistically significant improvements over state-of-the-art. Currently, Bi-DFCL has been deployed at Meituan, one of the largest online food delivery platforms in the world.

artificial intelligence, machine learning, rct, (18 more...)

arXiv.org Artificial Intelligence

2510.19517

Country: Asia > China (0.28)

Genre:

Research Report > Strength High (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology > Services (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

eb5254c4ee813d05af9c098f2d9c5708-Paper-Conference.pdf

Neural Information Processing SystemsOct-11-2025, 00:46:52 GMT

dataset, heterogeneous observational data, observational data, (17 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
Europe > Switzerland (0.04)
(2 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > Strength High (0.68)

Industry:

Information Technology (0.67)
Health & Medicine (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Information Management (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.68)
(3 more...)

Add feedback

Augmenting Limited and Biased RCTs through Pseudo-Sample Matching-Based Observational Data Fusion Method

Han, Kairong, Huang, Weidong, Zhou, Taiyang, Zhen, Peng, Kuang, Kun

arXiv.org Machine LearningSep-24-2025

In the online ride-hailing pricing context, companies often conduct randomized controlled trials (RCTs) and utilize uplift models to assess the effect of discounts on customer orders, which substantially influences competitive market outcomes. However, due to the high cost of RCTs, the proportion of trial data relative to observational data is small, which only accounts for 0.65\% of total traffic in our context, resulting in significant bias when generalizing to the broader user base. Additionally, the complexity of industrial processes reduces the quality of RCT data, which is often subject to heterogeneity from potential interference and selection bias, making it difficult to correct. Moreover, existing data fusion methods are challenging to implement effectively in complex industrial settings due to the high dimensionality of features and the strict assumptions that are hard to verify with real-world data. To address these issues, we propose an empirical data fusion method called pseudo-sample matching. By generating pseudo-samples from biased, low-quality RCT data and matching them with the most similar samples from large-scale observational data, the method expands the RCT dataset while mitigating its heterogeneity. We validated the method through simulation experiments, conducted offline and online tests using real-world data. In a week-long online experiment, we achieved a 0.41\% improvement in profit, which is a considerable gain when scaled to industrial scenarios with hundreds of millions in revenue. In addition, we discuss the harm to model training, offline evaluation, and online economic benefits when the RCT data quality is not high, and emphasize the importance of improving RCT data quality in industrial scenarios. Further details of the simulation experiments can be found in the GitHub repository https://github.com/Kairong-Han/Pseudo-Matching.

observational data, rct data, scenario, (16 more...)

arXiv.org Machine Learning

2509.18148

Country:

Asia > South Korea > Seoul > Seoul (0.05)
Asia > China > Beijing > Beijing (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)
(4 more...)

Genre:

Research Report > Strength High (1.00)
Research Report > Experimental Study (1.00)

Industry:

Transportation > Passenger (0.68)
Health & Medicine (0.46)

Technology:

Information Technology > Data Science > Data Integration (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Conditional Average Treatment Effect Estimation Under Hidden Confounders

Aloui, Ahmed, Dong, Juncheng, Hasan, Ali, Tarokh, Vahid

arXiv.org Machine LearningJun-17-2025

One of the major challenges in estimating conditional potential outcomes and conditional average treatment effects (CATE) is the presence of hidden confounders. Since testing for hidden confounders cannot be accomplished only with observational data, conditional unconfoundedness is commonly assumed in the literature of CATE estimation. Nevertheless, under this assumption, CATE estimation can be significantly biased due to the effects of unobserved confounders. In this work, we consider the case where in addition to a potentially large observational dataset, a small dataset from a randomized controlled trial (RCT) is available. Notably, we make no assumptions on the existence of any covariate information for the RCT dataset, we only require the outcomes to be observed. We propose a CATE estimation method based on a pseudo-confounder generator and a CATE model that aligns the learned potential outcomes from the observational data with those observed from the RCT. Our method is applicable to many practical scenarios of interest, particularly those where privacy is a concern (e.g., medical applications). Extensive numerical experiments are provided demonstrating the effectiveness of our approach for both synthetic and real-world datasets.

artificial intelligence, machine learning, potential outcome, (18 more...)

arXiv.org Machine Learning

2506.12304

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)

Genre:

Research Report > Strength High (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Education (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

A Two-Stage Pretraining-Finetuning Framework for Treatment Effect Estimation with Unmeasured Confounding

Zhou, Chuan, Li, Yaxuan, Zheng, Chunyuan, Zhang, Haiteng, Zhang, Min, Li, Haoxuan, Gong, Mingming

arXiv.org Artificial IntelligenceJan-15-2025

Estimating the conditional average treatment effect (CATE) from observational data plays a crucial role in areas such as e-commerce, healthcare, and economics. Existing studies mainly rely on the strong ignorability assumption that there are no unmeasured confounders, whose presence cannot be tested from observational data and can invalidate any causal conclusion. In contrast, data collected from randomized controlled trials (RCT) do not suffer from confounding, but are usually limited by a small sample size. In this paper, we propose a two-stage pretraining-finetuning (TSPF) framework using both large-scale observational data and small-scale RCT data to estimate the CATE in the presence of unmeasured confounding. In the first stage, a foundational representation of covariates is trained to estimate counterfactual outcomes through large-scale observational data. In the second stage, we propose to train an augmented representation of the covariates, which is concatenated to the foundational representation obtained in the first stage to adjust for the unmeasured confounding. To avoid overfitting caused by the small-scale RCT data in the second stage, we further propose a partial parameter initialization approach, rather than training a separate network. The superiority of our approach is validated on two public datasets with extensive experiments. The code is available at https://github.com/zhouchuanCN/KDD25-TSPF.

artificial intelligence, information management, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2501.08888

Country:

North America > Canada > Ontario > Toronto (0.05)
Asia > China > Beijing > Beijing (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(5 more...)

Genre:

Research Report > Strength High (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Information Management (0.93)

Add feedback

Combining Incomplete Observational and Randomized Data for Heterogeneous Treatment Effects

Yao, Dong, Tang, Caizhi, Cui, Qing, Li, Longfei

arXiv.org Artificial IntelligenceOct-28-2024

Data from observational studies (OSs) is widely available and readily obtainable yet frequently contains confounding biases. On the other hand, data derived from randomized controlled trials (RCTs) helps to reduce these biases; however, it is expensive to gather, resulting in a tiny size of randomized data. For this reason, effectively fusing observational data and randomized data to better estimate heterogeneous treatment effects (HTEs) has gained increasing attention. However, existing methods for integrating observational data with randomized data must require \textit{complete} observational data, meaning that both treated subjects and untreated subjects must be included in OSs. This prerequisite confines the applicability of such methods to very specific situations, given that including all subjects, whether treated or untreated, in observational studies is not consistently achievable. In our paper, we propose a resilient approach to \textbf{C}ombine \textbf{I}ncomplete \textbf{O}bservational data and randomized data for HTE estimation, which we abbreviate as \textbf{CIO}. The CIO is capable of estimating HTEs efficiently regardless of the completeness of the observational data, be it full or partial. Concretely, a confounding bias function is first derived using the pseudo-experimental group from OSs, in conjunction with the pseudo-control group from RCTs, via an effect estimation procedure. This function is subsequently utilized as a corrective residual to rectify the observed outcomes of observational data during the HTE estimation by combining the available observational data and the all randomized data. To validate our approach, we have conducted experiments on a synthetic dataset and two semi-synthetic datasets.

artificial intelligence, machine learning, observational data, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3627673.3679593

2410.21343

Country:

North America > United States > Idaho > Ada County > Boise (0.05)
Asia > China > Zhejiang Province > Hangzhou (0.04)
North America > United States > Tennessee (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre:

Research Report > Strength High (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine (1.00)
Education (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Training and Validating a Treatment Recommender with Partial Verification Evidence

Unnikrishnan, Vishnu, Puga, Clara, Schleicher, Miro, Niemann, Uli, Langguth, Berthod, Schoisswohl, Stefan, Mazurek, Birgit, Cima, Rilana, Lopez-Escamez, Jose Antonio, Kikidis, Dimitris, Vellidou, Eleftheria, Pryss, Ruediger, Schlee, Winfried, Spiliopoulou, Myra

arXiv.org Artificial IntelligenceJun-10-2024

Current clinical decision support systems (DSS) are trained and validated on observational data from the target clinic. This is problematic for treatments validated in a randomized clinical trial (RCT), but not yet introduced in any clinic. In this work, we report on a method for training and validating the DSS using the RCT data. The key challenges we address are of missingness -- missing rationale for treatment assignment (the assignment is at random), and missing verification evidence, since the effectiveness of a treatment for a patient can only be verified (ground truth) for treatments what were actually assigned to a patient. We use data from a multi-armed RCT that investigated the effectiveness of single- and combination- treatments for 240+ tinnitus patients recruited and treated in 5 clinical centers. To deal with the 'missing rationale' challenge, we re-model the target variable (outcome) in order to suppress the effect of the randomly-assigned treatment, and control on the effect of treatment in general. Our methods are also robust to missing values in features and with a small number of patients per RCT arm. We deal with 'missing verification evidence' by using counterfactual treatment verification, which compares the effectiveness of the DSS recommendations to the effectiveness of the RCT assignments when they are aligned v/s not aligned. We demonstrate that our approach leverages the RCT data for learning and verification, by showing that the DSS suggests treatments that improve the outcome. The results are limited through the small number of patients per treatment; while our ensemble is designed to mitigate this effect, the predictive performance of the methods is affected by the smallness of the data. We provide a basis for the establishment of decision supporting routines on treatments that have been tested in RCTs but have not yet been deployed clinically.

alignment, treatmentrecommender, validation, (14 more...)

arXiv.org Artificial Intelligence

2406.06654

Country:

Europe > Germany > Berlin (0.14)
Europe > Greece > Attica > Athens (0.04)
Europe > Germany > Saxony-Anhalt > Magdeburg (0.04)
(8 more...)

Genre:

Research Report > Strength High (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Otolaryngology (0.46)
Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Decision Support Systems (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Bayesian Prognostic Covariate Adjustment With Additive Mixture Priors

Vanderbeek, Alyssa M., Sabbaghi, Arman, Walsh, Jon R., Fisher, Charles K.

arXiv.org Machine LearningNov-22-2023

Effective and rapid decision-making from randomized controlled trials (RCTs) requires unbiased and precise treatment effect inferences. Two strategies to address this requirement are to adjust for covariates that are highly correlated with the outcome, and to leverage historical control information via Bayes' theorem. We propose a new Bayesian prognostic covariate adjustment methodology, referred to as Bayesian PROCOVA, that combines these two strategies. Covariate adjustment in Bayesian PROCOVA is based on generative artificial intelligence (AI) algorithms that construct a digital twin generator (DTG) for RCT participants. The DTG is trained on historical control data and yields a digital twin (DT) probability distribution for each RCT participant's outcome under the control treatment. The expectation of the DT distribution, referred to as the prognostic score, defines the covariate for adjustment. Historical control information is leveraged via an additive mixture prior with two components: an informative prior probability distribution specified based on historical control data, and a weakly informative prior distribution. The mixture weight determines the extent to which posterior inferences are drawn from the informative component, versus the weakly informative component. This weight has a prior distribution as well, and so the entire additive mixture prior is completely pre-specifiable without involving any RCT information. We establish an efficient Gibbs algorithm for sampling from the posterior distribution, and derive closed-form expressions for the posterior mean and variance of the treatment effect parameter conditional on the weight, in Bayesian PROCOVA. We evaluate efficiency gains of Bayesian PROCOVA via its bias control and variance reduction compared to frequentist PROCOVA in simulation studies that encompass different discrepancies. These gains translate to smaller RCTs.

artificial intelligence, machine learning, procova, (13 more...)

arXiv.org Machine Learning

2310.18027

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Research Report > Strength High (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Law (0.68)
Government > Regional Government > North America Government > United States Government > FDA (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Improving uplift model evaluation on RCT data

Bokelmann, Björn, Lessmann, Stefan

arXiv.org Machine LearningDec-16-2022

Estimating treatment effects is one of the most challenging and important tasks of data analysts. In many applications, like online marketing and personalized medicine, treatment needs to be allocated to the individuals where it yields a high positive treatment effect. Uplift models help select the right individuals for treatment and maximize the overall treatment effect (uplift). A major challenge in uplift modeling concerns model evaluation. Previous literature suggests methods like the Qini curve and the transformed outcome mean squared error. However, these metrics suffer from variance: their evaluations are strongly affected by random noise in the data, which renders their signals, to a certain degree, arbitrary. We theoretically analyze the variance of uplift evaluation metrics and derive possible methods of variance reduction, which are based on statistical adjustment of the outcome. We derive simple conditions under which the variance reduction methods improve the uplift evaluation metrics and empirically demonstrate their benefits on simulated and real-world data. Our paper provides strong evidence in favor of applying the suggested variance reduction procedures by default when evaluating uplift models on RCT data.

data mining, machine learning, variance, (19 more...)

arXiv.org Machine Learning

2210.02152

Country: South America > Uruguay > Maldonado > Maldonado (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > Strength High (0.67)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Modeling & Simulation (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback